Goto

Collaborating Authors

 columbia university


Why Soccer Still Defies Statistical Analysis

WIRED

Sarah Rudd, who once ran analytics for Arsenal, made her name applying the tenets of probability theory to movements on the pitch. Even she admits not everything can be solved with data. The role of advanced analytics in sports is a contentious subject. To its defenders, data-driven pragmatism is a natural evolutionary step in the way we play and watch games. For detractors, the approach prioritizes results above all else and drains the soul from a pursuit that should be spontaneous and joyful.



It's Causing People to Lose Jobs, Shatter Relationships, and Drain Their Savings. One Support Group Is Sounding the Alarm.

Slate

A.I.-related psychosis has cost people their marriages, life savings, and grip on reality. Last August, Adam Thomas found himself wandering the dunes of Christmas Valley, Oregon, after a chatbot kept suggesting he mystically "follow the pattern" of his own consciousness. Thomas was running on very little sleep--he'd been talking to his chatbot around the clock for months by that point, asking it to help improve his life. Instead it sent him on empty assignments, like meandering the vacuous desert sprawl. He'd lost his job as a funeral director and was living out of a van, draining his savings, and now he found himself stranded in the desert. When he woke up outside on a stranger's futon with no money to his name, he knew he'd hit rock bottom. "I wasn't aware of the dangers at the time, and I thought that the A.I. had statistical analysis abilities that would allow it to assist me if I opened up about my life," Thomas told me.


Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance

Holzer, Nikolaus, Fishell, William, Ray, Baishakhi, Santolucito, Mark

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets or formal mathematical proof systems such as the Lean proof assistant. Whilst ad-hoc generated methods can capture the decision chains of real-world reasoning processes, they may encode some inadvertent bias in the space of reasoning they cover; they also cannot be formally verified. On the other hand, systems like Lean can guarantee verifiability, but are not well-suited to capture the nature of agentic decision chain-based tasks. This creates a gap both in performance for functions such as business agents or code assistants, and in the usefulness of LLM reasoning benchmarks, whereby these fall short in reasoning structure or real-world alignment. We introduce TempoBench, the first formally grounded and verifiable diagnostic benchmark that parametrizes difficulty to systematically analyze how LLMs perform reasoning. TempoBench uses two evaluation benchmarks to break down reasoning ability. First, temporal trace evaluation (TTE) tests the ability of an LLM to understand and simulate the execution of a given multi-step reasoning system. Subsequently, temporal causal evaluation (TCE) tests an LLM's ability to perform multi-step causal reasoning and to distill cause-and-effect relations from complex systems. We find that models score 65.6% on TCE-normal, and 7.5% on TCE-hard. This shows that state-of-the-art LLMs clearly understand the TCE task but perform poorly as system complexity increases. Our code is available at our \href{https://github.com/nik-hz/tempobench}{GitHub repository}.



Rank-Induced PL Mirror Descent: A Rank-Faithful Second-Order Algorithm for Sleeping Experts

Zhang, Tiantian

arXiv.org Artificial Intelligence

We introduce a new algorithm, \emph{Rank-Induced Plackett--Luce Mirror Descent (RIPLM)}, which leverages the structural equivalence between the \emph{rank benchmark} and the \emph{distributional benchmark} established in \citet{BergamOzcanHsu2022}. Unlike prior approaches that operate on expert identities, RIPLM updates directly in the \emph{rank-induced Plackett--Luce (PL)} parameterization. This ensures that the algorithm's played distributions remain within the class of rank-induced distributions at every round, preserving the equivalence with the rank benchmark. To our knowledge, RIPLM is the first algorithm that is both (i) \emph{rank-faithful} and (ii) \emph{variance-adaptive} in the sleeping experts setting.


Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors

Tong, Mutian, Wu, Rundi, Zheng, Changxi

arXiv.org Artificial Intelligence

Indoor lighting estimation from a single image or video remains a challenge due to its highly ill-posed nature, especially when the lighting condition of the scene varies spatially and temporally. We propose a method that estimates from an input video a continuous light field describing the spatiotemporally varying lighting of the scene. We leverage 2D diffusion priors for optimizing such light field represented as a MLP. To enable zero-shot generalization to in-the-wild scenes, we fine-tune a pre-trained image diffusion model to predict lighting at multiple locations by jointly inpainting multiple chrome balls as light probes. We evaluate our method on indoor lighting estimation from a single image or video and show superior performance over compared baselines. Most importantly, we highlight results on spatiotemporally consistent lighting estimation from in-the-wild videos, which is rarely demonstrated in previous works.


LLM-based Realistic Safety-Critical Driving Video Generation

Fu, Yongjie, Zha, Ruijian, Tian, Pei, Di, Xuan

arXiv.org Artificial Intelligence

Designing diverse and safety-critical driving scenarios is essential for evaluating autonomous driving systems. In this paper, we propose a novel framework that leverages Large Language Models (LLMs) for few-shot code generation to automatically synthesize driving scenarios within the CARLA simulator, which has flexibility in scenario scripting, efficient code-based control of traffic participants, and enforcement of realistic physical dynamics. Given a few example prompts and code samples, the LLM generates safety-critical scenario scripts that specify the behavior and placement of traffic participants, with a particular focus on collision events. To bridge the gap between simulation and real-world appearance, we integrate a video generation pipeline using Cosmos-Transfer1 with ControlNet, which converts rendered scenes into realistic driving videos. Our approach enables controllable scenario generation and facilitates the creation of rare but critical edge cases, such as pedestrian crossings under occlusion or sudden vehicle cut-ins. Experimental results demonstrate the effectiveness of our method in generating a wide range of realistic, diverse, and safety-critical scenarios, offering a promising tool for simulation-based testing of autonomous vehicles.


LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs

Inamdar, Amogh, Macar, Uzay, Vazirani, Michel, Tarnow, Michael, Mustapha, Zarina, Dittren, Natalia, Sadeh, Sam, Verma, Nakul, Salleb-Aouissi, Ansaf

arXiv.org Artificial Intelligence

The study of propositional logic -- fundamental to the theory of computing -- is a cornerstone of the undergraduate computer science curriculum. Learning to solve logical proofs requires repeated guided practice, but undergraduate students often lack access to on-demand tutoring in a judgment-free environment. In this work, we highlight the need for guided practice tools in undergraduate mathematics education and outline the desiderata of an effective practice tool. We accordingly develop LogicLearner, a web application for guided logic proof practice. LogicLearner consists of an interface to attempt logic proofs step-by-step and an automated proof solver to generate solutions on the fly, allowing users to request guidance as needed. We pilot LogicLearner as a practice tool in two semesters of an undergraduate discrete mathematics course and receive strongly positive feedback for usability and pedagogical value in student surveys. To the best of our knowledge, LogicLearner is the only learning tool that provides an end-to-end practice environment for logic proofs with immediate, judgment-free feedback.


Like babies and dancers, this robot learns from studying itself

Popular Science

Researchers from Columbia University have successfully developed an autonomous robot arm capable of learning new motions and adapting to damage simply by watching itself move. The robot observed a video of itself and then used that data to plan its next actions--a practice the researchers refer to as "kinematic self-awareness." This unique learning process is designed to mimic the way humans adjust certain movements by watching themselves in a mirror. Teaching robots to learn this way could reduce the need for extensive training in bespoke 3D simulations. It could also one day make future autonomous robots operating in the real world better equipped to adapt to damage and environmental changes without constant human intervention.